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Abstract 



This note presents a unified analysis of the recovery of simple objects from random linear 
' measurements. When the linear functionals are Gaussian, we show that an s-sparse vector in R ra 

can be efficiently recovered from 2s log n measurements with high probability and a rank r, nxn 
matrix can be efficiently recovered from r(6n — 5r) measurements with high probability. For 
sparse vectors, this is within an additive factor of the best known nonasymptotic bounds. For 
low-rank matrices, this matches the best known bounds. We present a parallel analysis for block- 
, sparse vectors obtaining similarly tight bounds. In the case of sparse and block-sparse signals, 

we additionally demonstrate that our bounds are only slightly weakened when the measurement 
map is a random sign matrix. Our results are based on analyzing a particular dual point which 
certifies optimality conditions of the respective convex programming problem. Our calculations 
rely only on standard large deviation inequalities and our analysis is self-contained. 



Keywords, ^i-norm minimization nuclear-norm minimization block-sparsity duality random 
matrices. 
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■ 1 Introduction 

The past decade has witnessed a revolution in convex optimization algorithms for recovering struc- 
tured models from highly incomplete information. Work in compressed sensing has shown that 
when a vector is sparse, then it can be reconstructed from a number of nonadaptive linear measure- 
ments proportional to a logarithmic factor times the signal's sparsity level [HE]. Building on this 
work, many have recently demonstrated that if an array of user data has low-rank, then the matrix 
can be re-assembled from a sampling of information proportional to the number of parameters 
required to specify a low-rank factorization. See [2, 3, [19] for some early references on this topic. 

Sometimes, one would like to know precisely how many measurements are needed to recover an 
s-sparse vector (a vector with at most s nonzero entries) by l\ minimization or a rank-r matrix by 
nuclear- norm minimization. This of course depends on the kind of measurements one is allowed to 
take, and can be empirically determined or approximated by means of numerical studies. At the 
theoretical level, however, very precise answers — e.g., perfect knowledge of numerical constants — for 
models of general interest may be very hard to obtain. For instance, in [4J, the authors demonstrated 
that about 20s log n randomly selected Fourier coefficients were sufficient to recover an s-sparse 
signal, but determining the minimum number that would suffice appears to be a very difficult 
question. Likewise, obtaining precise theoretical knowledge about the number of randomly selected 
entries required to recover a rank-r matrix by convex programming seems delicate, to say the least. 



* Mathematics and Statistics Departments, Stanford University, Stanford, CA 94305. candes@stanford.edu 
^ Computer Sciences Department, University of Wisconsin-Madison, Madison, WI, 53706. brecht@cs.wisc.edu 



1 



For some special and idealized models, however, this is far easier and the purpose of this note is to 
make this clear. 

In this note, we demonstrate that many bounds concerning Gaussian measurements can be de- 
rived via elementary, direct methods using Lagrangian duality. By a careful analysis of a particular 
Lagrange multiplier, we are able to prove that 2s log n measurements are sufficient to recover an 
s-sparse vector in R n and r(6n — 5r) measurements are sufficient to recover a rank r, n x n matrix 
with high probability. These almost match the best-known, non-asymptotic bounds for sparse vec- 
tor reconstruction (2slog(n/s) + 5/4s measurements [HE]); and match the best known bounds for 
low-rank matrix recovery in the nuclear norm (as reported in [5jll6]). 

The work [5], cited above, presents a unified view of the convex programming approach to inverse 
problems and provides a relatively simple framework to derive exact, robust recovery bounds for a 
variety of simple models. As we already mentioned, the authors also provide rather tight bounds on 
sparse vector and low-rank matrix recovery in the Gaussian measurement ensemble by using a deep 
theorem in functional analysis due to Gordon, which concerns the intersection of random subspaces 
with subsets of the sphere Gordon's Theorem has also been used to provide sharp estimates of 
the phase transitions for the i\ and nuclear norm heuristics in [20] and |16| respectively. Our work 
complements these results, demonstrating that the dual multiplier ansatz proposed in |10| can also 
yield very tight bounds for many signal recovery problems. 

To introduce our results, suppose we are given information about an object xq G W 1 of the 
form &xq £ M. m where $ is an m x n matrix. When has entries i.i.d. sampled from a Gaussian 
distribution with mean and variance 1/m, we call it a Gaussian measurement map. We want 
bounds on the number of rows m of $ to ensure that Xq is the unique minimizer of the problem 

minimize ||:e||.a q j\ 

subject to &x = <frxo. 

Here || • ||^ is a norm with some suitable properties which encourage solutions which conform to 
some notion of simplicity. Our first result is the following 

Theorem 1.1 Let Xq be an arbitrary s- sparse vector and \\ ■ \\a be the i\ norm. Let (3 > 1. 

• For Gaussian measurement maps with m > 2(3slogn + s, the recovery is exact with proba- 
bility at least 1 — 2n~^ /3 ' s ) where 

/GM = 




• Let e G (0,1). For binary measurement maps $ with i.i.d. entries taking on values ±m~ l l 2 
with equal probability, there exist numerical constants Co and c\ such that if n > exp(co/e 2 ) 
andm > 2/3(1 — e)~ 2 s log n+s, the recovery is exact with probability at least 1 — n 1- ^ — n~ Cl ^ e . 

The algebraic expression f((3, s) is positive for all /3 > 1 and s > 0. For all fixed j3 > 1, /(/3, s) is an 
increasing function of s so that min s >i /(/3, s) = /(/3, 1). Moreover, observe that lim^oo f(/3, s) = 
f3 — l. For binary measurement maps, our result states that for any 5 > 0, (2 + <5)slogn entries 
suffice to recover an s-sparse signal when n is sufficiently large. We also provide a very similar 
result for block-sparse signals, stated in Section [3.2i 

Our third result concerns the recovery of a low-rank matrix. 
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Theorem 1.2 Let Xq be an arbitrary n\ x ni rank-r -matrix and \\ ■ \\ A be the matrix nuclear norm. 
For a Gaussian measurement map with m > (3r(3ni + 3ri2 — 5r) for some (3 > 1, the recovery is 
exact with probability at least 1 — 2e^~^ n ^ 8 , where n = max(m,n2). 

Our results are 1) nonasymptotic and 2) demonstrate sharp constants for sparse signal and low- 
rank matrix recovery, perhaps the two most important cases in the general model reconstruction 
framework. Further, our bounds are proven using elementary concepts from convex analysis and 
probability theory. In fact, the most elaborate result from probability that we employ concerns the 
largest singular value of a Gaussian random matrix, and this is only needed to analyze the rank 
minimization problem. 

We show in Section [2] that the same construction and analysis can be applied to prove Theorems 
ll.ll and ll.2t The method, however, handles a variety of complexity regularizers including the l\/&2- 
norm as well. When specialized in Section [3j we demonstrate sharp constants for exact model 
reconstruction in all three of these cases (£\, and nuclear norms). We conclude the paper 

with a brief discussion of how to extend these results to other measurement ensembles. Indeed, with 
very minor modifications, we can achieve almost the same constants for subgaussian measurement 
ensembles in some settings such as sign matrices as reflected by the second part of Theorem 11.11 

2 Dual Multipliers and Decomposable Regularizers 

Definition 2.1 The dual norm is defined as 

\\x\\*a = sup{(x,a) : ||a|U<l}. (2.1) 

A consequence of the definition is the well-known and useful dual-norm inequality 

|(aj,y)|<HU||y||^. (2.2) 

The supremum in (12. ip is always achieved and thus the dual norm inequality (I2.2p is tight in the 
sense that for any x, there is a corresponding y that achieves equality. Additionally, it is clear from 
the definition that the subdifferential of || • ||_4 at x is {v : (v,x) = ||a;||_4, \\v\\^ < 1}. 

2.1 Decomposable Norms 

We will restrict our attention to norms whose subdifferential has very special structure effectively 
penalizing "complex" solutions. In a similar spirit to |15| . the following definition summarizes the 
necessary properties of a good complexity regularizer: 

Definition 2.2 A norm \\ ■ ||_4 is decomposable at xq if there is a subspace T C W 1 and a vector 
e G T such that the subdifferential at xq has the form 

d\\x \\ A = {z e M n : V T (z) = e and \\V T ±{z)\\* A < 1} 

and for any w G T 1 - , we have 

\\w\\a = sup (v, w) . 

iMii<i 

Above, Vt (resp. T > t±) is the orthogonal projection onto T (resp. orthogonal complement ofT). 
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When a norm is decomposable at 2:0, the norm essentially penalizes elements in T 1 - indepen- 
dently from Xq. The most common decomposable regularizer is the l\ norm on M. n . In this case, 
if xq is an s-sparse vector, then T denotes the set of coordinates where xq is nonzero and T 1 - the 
complement of T in {1, . . . , n}. We denote by xq^t the restriction of xq to T and by sgn(a?o,r) the 
vector with ±1 entries depending upon the signs of those of aio,T- The dual norm to the l\ norm 
is the norm. The subdifferential of the l\ norm at xq is given by 

0||aj o ||l = U e M n : V T (z) = sgn(x , T ) and ||7V(*)IU < 1} • 

That is, z is equal to the sign of xq on T and has entries with magnitudes bounded above by 1 on 
the orthogonal complement. As we will discuss in Section [3l the d.\/(.2 norm and the matrix nuclear 
norm are also decomposable. The following Lemma gives conditions under which xq is the unique 
minimizer of (jl.ip . 

Lemma 2.3 Suppose that <I> is injective on the subspace T and that there exists a vector y in the 
image 0/ <&* (the adjoint of<&) obeying 

1- Vriy) = e ; where e is as in Definition \2.£l 

2- \\V T ^y)\\\<l. 

Then Xq is the unique minimizer of U.l\) . 

Proof The proof is an adaptation from a standard argument. Consider any perturbation xq + h 
where <&h = 0. Since the norm is decomposable, there exists a v G T 1 - such that ||i>||^ < 1 and 
(v,Vt±(1i)) = \\Vt± (h)\\j[. Moreover, we have that e + v is a subgradient of || • ||^ at xq. Hence, 

«olU + (e + v,h) 
Xo\\a + (e + v - y,h) 
xq\\a + (v -V T ±{y),V T ±(h)) 
x \\A + (l-\\r T ^yW A )\\V T ^h)\\ A . 

Since H^t- 1 -^)!!^ i s strictly less than one, this last inequality holds strictly unless V T ±(h) = 0. 
But if Vj>±(h) = 0, then Vxih) must also be zero because we have assumed that $ is injective on 
T. This means that h is zero proving that xq is the unique minimizer of (jl.ip . ■ 



\x + h\U > 



> 



2.2 Constructing a Dual Multiplier 

To construct a y satisfying the conditions of Lemma 12 .3|. we follow the program developed in [10] 
and followed by many researchers in the compressed sensing literature. Namely, we choose the least 
squares solution of Vri^q) = e, and then prove that y := &*q has dual norm strictly less than 1 
on T ± . 

Let $t and denote the restriction of $ to T and T 1 - respectively. Let dx denote the 
dimension of the space T. Observe that if <3?t is injective, then 

q = $ T ($* ^t)-^, (2.3) 
V T ±{y) = $* T± q. (2.4) 
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The key fact we use to derive our bounds in this note is that, when $ is a Gaussian map, q and $ T± 
are independent, no matter what T is. This follows from the isotropy of the Gaussian ensemble. 
This property is also true in the sparse-signal recovery setting whenever the columns of $ are 
independent. Another way to express the same idea is that given the value of q, one can infer the 
distribution of "P r ±(y) with no knowledge of the values of the matrix $t. 

We assume in the remainder of this section that $ is a Gaussian map. Conditioned on q, 
V T± (y) is distributed as 

II I|2 

where t T ± is an isometry from W 1 ~ dT onto T 1 - and g ~ A/"(0, -^^1) (here and in the sequel, || • H2 
is the £2 norm). Also, <1?t is injective as long as m > <1t and to bound the probability that the 
optimization problem (jl.ip recovers xq, we therefore only need to bound 

n\\r T ^y)\\* A > 1] < nw-PrAvWA > 1 1 Nh < r]+nhh > a (2.5) 

for some value of r greater than 0. The first term in the upper bound will be analyzed on a case- 
by-case basis in Section [3l As we have remarked, once we have conditioned on q, this term just 
requires us to analyze the large deviations of Gaussian random variables in the dual norm. What 
is more surprising is that the second term can be tightly upper bounded in a generic fashion for 
the Gaussian ensemble, independent of the regularizer under study. 
To see this, observe that q has squared norm 

||g||l = (e, (S^r)-^) . 

By assumption, (^^r) -1 is a x drp inverse Wishart matrix with m degrees of freedom and 
covariance m _1 Jd T . Since the Gaussian distribution is isotropic, we have that ||q||| is distributed 
as ||e|||mi?ii, where B\\ is the first entry in the first column of an inverse Wishart matrix with m 
degrees of freedom and covariance Id T ■ 

To estimate the large deviations of ||q||2, it thus suffices to understand the large deviations of 
B\i. A classical result in statistics states that Byy is distributed as an inverse chi-squared random 
variable with m — dp + 1 degrees of freedom (see, [14^ page 72] for example)0- We can thus lean on 
tail bounds for the chi-squared distribution to control the magnitude of B\\. For each t > 0, 



/ Tfl 

|g||2>W -||e|| 2 =¥[z<m-dT + l-t\ 

y m — dT + 1 — t . 

f f - (2 - 6 » 

- eXP V 4(m-<*r + l), 

Here z is a chi-squared random variable with m — dx + 1 degrees of freedom, and the final inequality 
follows from the standard tail bound for chi-square random variables (see, for example, |13|). 
To summarize, we have proven the following 

Proposition 2.4 Let \\ ■ \\^ be a decomposable regularizer at xq and let t > 0. Let q and y be 
defined as in \2.S\) and \2.J$ . Then x$ is the unique optimal solution of ( ti. 1\) with probability at 
least 



1 



9lla ^ \l m-dT+i-tW e h 



exp(-^r). (2.7) 



The reader not familiar with this result can verify with linear algebra that 1/ B\\ is equal to the squared distance 
between the first column of <J>t and the linear space spanned by all the others. This squared distance is a chi-squared 
random variable with m — dr + 1 degrees of freedom. 
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3 Bounds 



Using Proposition 12.41 we can now derive non-asymptotic bounds for exact recovery of sparse 
vectors, block-sparse vectors, and low-rank matrices in a unified fashion. 

3.1 Compressed Sensing in the Gaussian Ensemble 

Let Xq be an s-sparse vector in W 1 . In this case, T denotes the set of coordinates where Xq is 
nonzero and T 1 - the complement of T in {1, . . . , n}. As previously discussed, the dual norm to the 
t\ norm is the norm and the sub differential of the l\ norm at xq is given by 

0||aj o ||l = U e M n : V T (z) = sgn(x , T ) and ||7V(*)lloc < 1} • 

Here, dim(T) = s, the sparsity of xq, and e = sgn(a?o) so that ||e||2 = \/s. 

For m > s, set q and y as in (|2.3p and (|2.4p . To apply Proposition 12.41 we ° mv need to 
estimate the probability that ||"P r ±(y)||oo exceeds 1 conditioned on the event that ||g||2 is bounded. 
Conditioned on g, the components of V T ±(y) in T 1 - are i.i.d. A/"(0, ||qf|||/m). Hence, for any r > 0, 
the union bound gives 

P[||7V(y)lloo > 1 | ||q|| 2 < t] < (n-s)F[\z\ > 0n/r] 

(772 \ 
~^2j> (3-1) 

where z ~ Af(0, 1). We have made use above of the elementary inequality P(\z\ > t) < e - * 2 / 2 which 
holds for alH > 0. For /3 > 1, select 



with t = 2/31og(n) WH ^ --1 



m — s + 1 — t \y /3 

Here, i is chosen to make the two exponential terms in our probability equal to each other. We can 
put all of the parameters together and plug (|3.ip into (|2.7p . For m = 2/3slogn + s, (3 > 1, a bit of 
algebra gives the first part of Theorem II. 11 

3.2 Block-Sparsity in the Gaussian Ensemble 

In simultaneous sparse estimation, signals are block-sparse in the sense that 1" can be decomposed 
into a decomposition of subspaces 

M 

r = 0H (3.2) 

6=1 

with each Vb having dimension B 0[T7] . We assume that signals of interest are only nonzero on a 
few of the V^'s and search for a solution which minimizes the norm 

M 



\ x \\h/e 2 = Yl H^l' 2 ' 



=i 



where Xb denotes the projection of x onto Vb- 
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Suppose xq is block-sparse with k active blocks. T here denotes the coordinates associated with 
the groups where xq has nonzero energy. T 1 - is equal to all of the coordinates of the groups where 
Xq = 0. The dual norm to the £1/^2 norm is the too I ^2 norm 



\x \\f i fn = max \\Xh 2 • 
1 IIW*2 kkm " 



The subdifferential of the ^1/^2 norm at a?o is given by 



9\\xo\\ ei 



It* 



»0,6 2 



and llPyx^) 



< 1 



Much like in the £1 case, T denotes the span of the set of active subspaces and T 1 - is the set of 
inactive subspaces. In this formulation, dim(T) = kB and 



E 



XQ,b 

\ x o,bh 



Note also that [|e||2 = \k. 

With the parameters we have just defined, we can define q and y by (|2.3p and (|2.4p . If we again 
condition on \\qW2, the components of y on T 1 - are i.i.d. A/"(0, ||q|||/m). Using the union bound, we 
have 

W[\\P T ±(y)\\ too/i2 > 1 I Nb<r] < J] P[W| 2 > 1 I |M| 2 <r]. (3.3) 
Conditioned on q, Tpp- Ill/bill is identically distributed as a chi-squared random variable with B 

Win 2 

degrees of freedom. Letting u = -Jxb, the Borell inequality [21] Proposition 5.34] gives 

¥(u>Eu + t) < e~* 2/2 . 
Since En < \^B, we have P(u > v^B + i) < e~* 2 / 2 . Using this inequality, with 



m-kB + l-t 
we have that the probability of failure is upper bounded by 

2^ 



Mexp 



m-kB + l-t 
k 



+ exp 



i 2 /4 



m — kB + 1 



(3.4) 



Choosing m > (1 + fi)k(VB + V21ogM) 2 + and setting i = (f3/2)k(\fB + v^log M) 2 , we can 
then upper bound (|3.4p by 



Mexp 



' U/l + /3/2(\/B + V2 log M) — \[~B 2 



+ exp 



P 2 



16(1 + /3) 



fc(VI + y21oiM) 2 ) < M"^ 4 + M-^/^) . 



This proves the following 
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Theorem 3.1 Let xq be a block-sparse signal with M blocks of size B and k active blocks under 
the decomposition jlS.ty) . Let \\ ■ ||_4 be the £\/l2 norm. For Gaussian measurement maps <3? with 

m > (1 + (3)k(VB + a/2 log M) 2 + kB 

the recovery is exact with probability at least M - ' 3 / 4 + M~^ 2 /( 8+8 ^) . 

The bound on m obtained by this theorem is identical to that of [18], and is, to our knowledge, the 
tightest known non-asymptotic bound for block-sparse signals. For example, when the block size 
B is much greater than logM, the results asserts that roughly 2kB measurements are sufficient for 
the convex programming to be exact. Since there are kB degrees of freedom, one can see that this 
is quite tight. 

Note that the theorem gives a recovery result for sparse vectors by setting B = 1, k = s, 
and M = n. In this case, Theorem 13.11 gives a slightly looser bound and requires a slightly more 
complicated argument as compared to Theorem 11.11 However, Theorem 13.11 provides bounds for 
more general types of signals, and we note that the same analysis would handle other t\jl v block 
regularization schemes defined as [lasll^/f = J2b=i \\ x b\\p with p G [2,oo]. Indeed, the l\jl v norm 
is decomposable and its dual is the ioo/^q norm with 1/p + 1/q = 1. The only adjustment would 
consist in bounding up to a scaling factor, this is a sum of independent standard normals 

and our analysis goes through. We omit the details. 

3.3 Low-Rank Matrix Recovery in the Gaussian Ensemble 

To apply our results to recovering low-rank matrices, we need a little bit more notation, but the 
argument is principally the same. Let Xq be an n\ x n<i matrix of rank r with singular value 
decomposition UT,V* . Without loss of generality, impose the conventions n\ < ri2, £ is r x r, U 
is n\ x r, V is rii x r. 

In the low-rank matrix reconstruction problem, the subspace T is the set of matrices of the form 
UY* + XV* where X and Y are arbitrary n\ x r and ri2 x r matrices. The span of matrices of the 
form UY* has dimension nir, the span of XV* has dimension n<ir, and the intersection of these 
two spans has dimension r 2 . Hence, we have dx = dim(T) = r(n\ + ri2 — r). T 1 - is the subspace 
of matrices spanned by the family (xy*), where x (respectively y) is any vector orthogonal to U 
(respectively V). The spectral norm denoted by || • || is dual to the nuclear norm. The subdifferential 
of the nuclear norm at Xq is given by 

d||X ||* = {Z : T T {Z) = UV* and \\V T ±(Z)\\ < 1} . 

Note that the Euclidean norm of UV* is equal to \fr. 

For matrices, a Gaussian measurement map takes the form of a linear operator whose ith 
component is given by 

MZ)] t = T,(**Z). 

Above, 3>j is an n\ x ri2 random matrix with i.i.d., zero- mean Gaussian entries with variance 1/m. 
This is equivalent to defining $ as an m x (^1^2) dimensional matrix acting on vec(Z), the vector 
composed of the columns of Z stacked on top of one another. In this case, the dual multiplier is a 
matrix taking the form 

Y = $*$ T (^$ T )" 1 (C/F*) . 
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Here, <E>t is the restriction of $ to the subspace T. Concretely, one could define a basis for T and 
write out $t as an m x dx dimensional matrix. Note that none of the abstract setup from Section [272 
changes for the matrix recovery problem: Y exists as soon asm > dim(T) = r(ni + n,2 — r) and 
Vt(Y) = UV* as desired. We need only guarantee that 117^(1^)11 < 1. We still have that 



V T ±(Y) = J2q i r T ±(A i 



i=i 



where q = §>t($t^t) (UV*) is given by ()2.3|) and, importantly, q and V T ±(Ai) are independent 
for all i. 

With such a definition, we can again straightforwardly apply (j2.7j) once we obtain an estimate of 
\\Vt± (Y)\\ conditioned on q. Observe that V T x(Y) = V u ±YV v _l, where Vjj± (respectively Vy± is 
a projection matrix onto the orthogonal complement of U (respectively V). It follows that Vj<± (Y) 
is identically distributed to a rotation of an (m — r) x (r&2 — r) Gaussian random matrix whose 
entries have mean zero and variance ||g|||/m. Using the Davidson-Szarek concentration inequality 
for the extreme singular values for Gaussian random matrices [6], we have 

W[\\V T ±{Y)\\ > 1 | ||q|| 2 < r] < exp ( ^ - - ^H^f^ ■ 



We are again in a position ready to prove (12. 7j) . To guarantee matrix recovery, with r 



-J^+i-f i we thus need 



^— V n l — r — V n 2 — r > . 

V 

This occurs if 

m > r(ni + n2 — r) + f \/ r ( n i — r ) + \A"( n 2 — r )l + 1 — 1 
But since (a + 6) 2 < 2(a 2 + 6 2 ), we can upper bound 

^y 7 r(ni — r) + y/r(ri2 — rfj < 2r(n\ + rt2 — 2r) . 

Setting f = (V2r + 1 - - l)(3rti + 3n 2 - 5r) in ([2771) then yields Theorem O 

4 Discussion 

We note that with minor modifications, the results for sparse and block-sparse signals can be 
extended to measurement matrices whose entries are i.i.d. subgaussian random variables. In this 
case, we can no longer use the theory of inverse Wishart matrices, but <& T ± and <J?t are still 
independent, and we can bound the norm of q using bounds on the smallest singular value of 
rectangular matrices. For example, Theorem 39 in [21] asserts that there exist positive constants 8 
and 7 such that the smallest singular value obeys the deviation inequality 



a mill ($T)<l-8Vd^-t <e^ mt (4.1) 
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for t > 0. 

We use this concentration inequality to prove the second part of Theorem ll.il Since || ^^rC^^^r) 1 1 
°"miTi (for) ; we have that 

II II ^ 
a h < , := p 

l-By/afm-t 

with probability at least 1 — e -7 " 1 * 2 . This is the analog of ()2.6p . Now whenever ||q||2 < p, Hoeffding's 
inequality [12] implies that (]3,ip still holds. Thus, we are in the same position as before, and obtain 



P[||7V(v)lloo > 1] < 2{n-a)exp(-— 5 ) +exp(- 7 mt 2 ). 

Setting t = e/2 proves the second part of Theorem 11.11 

For block-sparse signals, a similar argument would apply. The only caveat is that we would 
need the following concentration bound which follows from Lemma 5.2 in |TJ: Let M be an d\ x c?2 
dimensional matrix with i.i.d. entries taking on values ±1 with equal probability. Let v be a fixed 
vector in M. d2 . Then 

P[||M«|| 2 > 1] < exp f- Mi!=* ' 



provided ||u|| < V"i- Plugging this bound into fl3.3f) gives an analogous threshold for block-sparse 
signals in the Bernoulli model: 

Theorem 4.1 Let xq be a block- sparse signal with M blocks and k active blocks under the decom- 
position $3.2\) . Let || • m be the I1/I2 norm. Let (3 > 1 and e G (0,1). For binary measurement 
maps <]? with i.i.d. entries taking on values ±m -1 / 2 with equal probability, there exist numerical 
constants cq and c% such that if M > exp(co/e 2 ) and m > 4fc/3(l — e) _2 logM + 2kB, the recovery 
is exact with probability at least 1 — M 1 ^ 13 — M~ Cl ^ e . 



For low-rank matrix recovery, the situation is more delicate. With general subgaussian mea- 
surement matrices, we no longer have independence between the action on the subspaces T and 
T 1 - unless the singular vectors somehow align serendipitously with the coordinate axes. In this 
case, it unfortunately appears that we need to resort to more complicated arguments and will likely 
be unable to attain such small constants through the dual multiplier without a conceptually new 
argument. 
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